Unsupervised WSD with a Dynamic Thesaurus*

نویسندگان

  • Javier Tejada-Cárcamo
  • Hiram Calvo
  • Alexander Gelbukh
چکیده

Diana McCarthy et al. (ACL-2004) obtain the predominant sense for an ambiguous word based on a weighted thesaurus of words related to the ambiguous word. This thesaurus is obtained using Dekang Lin’s (COLING-ACL1998) distributional similarity method. Lin averages the distributional similarity by the whole training corpus; thus the list of words related to a given word in his thesaurus is given for a word as type and not as token, i.e., does not depend on a context in which the word occurred. We observed that constructing a list similar to Lin’s thesaurus but for a specific context converts the method by McCarthy et al. into a word sense disambiguation method. With this new method, we obtained a precision of 69.86%, which is even 7% higher than the supervised baseline.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri

Most word sense disambiguation (WSD) methods require large quantities of manually annotated training data and/or do not exploit fully the semantic relations of thesauri. We propose a new unsupervised WSD algorithm, which is based on generating Spreading Activation Networks (SANs) from the senses of a thesaurus and the relations between them. A new method of assigning weights to the networks’ li...

متن کامل

Semantic Distances for Sets of Senses and Applications in Word Sense Disambiguation

There has been an increasing interest both from the Information Retrieval community and the Data Mining community in investigating possible advantages of using Word Sense Disambiguation (WSD) for enhancing semantic information in the Information Retrieval and Data Mining process. Although contradictory results have been reported, there are strong indications that the use of WSD can contribute t...

متن کامل

HIT-CIR: An Unsupervised WSD System Based on Domain Most Frequent Sense Estimation

This paper presents an unsupervised system for all-word domain specific word sense disambiguation task. This system tags target word with the most frequent sense which is estimated using a thesaurus and the word distribution information in the domain. The thesaurus is automatically constructed from bilingual parallel corpus using paraphrase technique. The recall of this system is 43.5% on SemEv...

متن کامل

From Predicting Predominant Senses to Local Context for Word Sense Disambiguation

Recent work on automatically predicting the predominant sense of a word has proven to be promising (McCarthy et al., 2004). It can be applied (as a first sense heuristic) to Word Sense Disambiguation (WSD) tasks, without needing expensive hand-annotated data sets. Due to the big skew in the sense distribution of many words (Yarowsky and Florian, 2002), the First Sense heuristic for WSD is often...

متن کامل

Class Based Sense Definition Model for Word Sense Tagging and Disambiguation

We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense amb...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007